388
31
The Organization of Knowledge
“self-organization”. 10 In actual practice the mining is not completely autonomous;
the miner predefines classes onto which the data items will be mapped (supervised
learning from data—also known as “intelligent data analysis”), just as a real miner
generally knows what minerals he is seeking (but, to be sure, a good miner would be
open to finding and extracting other minerals that might unexpectedly occur in the
deposit). Typical tasks undertaken in practical data mining are
1.
Supervised (directed) learning
Classification into the predefined classes;
Estimation: extracting a value for some variable from the data;
Prediction: classifying according to possible future behaviour; estimating a
future value of the variable of interest;
2.
Unsupervised (undirected) learning
Association rules (dependency modeling): determining which items belong
together;
Clustering:
grouping
items
according
to
distance
on
some
metric
(cf. Sect. 13.2);
Description and visualization. These tasks are in turn embedded in a wider
framework, comprising
Data cleansing, a complex process that can be automated regarding internal
inconsistencies, but which presently at least still requires human scrutiny of
the laboratory methods used to acquire the data;
Integration; this might merely mean merging disparate databases in a common
format;
Selection in case the entire database will not be used; irrelevant information
could be automatically eliminated during the main mining process, but it may
save significant processing effort to carry out the elimination beforehand; note
that the criterion for irrelevance is preset;
Transformation: data might need to be transformed (in the same way that a
mathematical object could be represented in different coordinate systems) to
make items in a merged database compatible with each other;
Data mining proper (as described above); 11
Pattern evaluation—human annotation of whatever emerges;
Visualization.
In the next section we look at a specific subset of data mining.
Problem. Discuss the autonomy of the data mining process.
10 See footnote 30 in Chap. 6.
11 See Mabu et al. (2018), Table 1 or Deepthi et al. (2019) for overviews of data mining algorithms
in bioinformatics.